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In this paper, we consider a partially linear model of the form Yt = XJ9o+g(Vt) + e t , t = 1, . . . , n, 
where {Vt} is a j3 null recurrent Markov chain, {X t } is a sequence of either strictly stationary 
or non-stationary regressors and {et} is a stationary sequence. We propose to estimate both 8o 
and g(-) by a semi-parametric least-squares (SLS) estimation method. Under certain conditions, 
we then show that the proposed SLS estimator of 9o is still asymptotically normal with the 
same rate as for the case of stationary time series. In addition, we also establish an asymptotic 
distribution for the nonparametric estimator of the function <?(•). Some numerical examples are 
provided to show that our theory and estimation method work well in practice. 

Keywords: asymptotic theory; nonparametric estimation; null recurrent time series; 
semi-parametric regression 

1. Introduction 

During the past two decades, there has been much interest in various nonparametric and 
semi-parametric techniques to model time series data with possible nonlinearity. Both 
estimation and specification testing problems have been systematically examined for the 
case where the observed time series satisfy a type of stationarity. For more details and 
recent developments, see Robinson [26-28], Fan and Gijbels [8], Hardle et al. [15, 16], 
Fan and Yao [9], Gao [10], Li and Racine [21] and the references therein. 

As pointed out in the literature, the stationarity assumption seems too restrictive in 
practice. For example, when tackling economic and financial issues from a time perspec- 
tive, we often deal with non-stationary components. In reality, neither prices nor exchange 
rates follow a stationary law over time. Thus practitioners might feel more comfortable 
avoiding restrictions like stationarity for processes involved in economic time series mod- 
els. There is much literature on parametric linear and nonlinear models of non-stationary 
time series, but very little work has been done in nonparametric and semi-parametric non- 
linear cases. In nonparametric estimation of nonlinear regression and autoregression of 
non-stationary time series models and continuous-time financial models, existing studies 
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include Phillips and Park [25], Karlsen and Tj0stheim [20], Bandi and Phillips [1], Karlsen 
et al. [19], Schienle [30] and Wang and Phillips [32, 33]. Recently, Gao et al. [11, 12] consid- 
ered nonparametric specification testing in both autoregression and cointegration models. 
Consider a nonparametric regression model of the form 

Y t =m{Z t )+e t , t = l,...,n, (1.1) 

where {Y t } and {Z t } are non-stationary time series, m(-) is an unknown function defined 
in R p and {et} is a sequence of strictly stationary errors. We may apply a nonparametric 
method to estimate m(-), 

n 

fh{z) ■■= rh n (z) = a nt (z)Y t , (1.2) 
t=i 

where {a n t{z)} is a sequence of positive weight functions; sec Karlsen et al. [19] and 
Wang and Phillips [32, 33]. 

As pointed out in the literature for the case where the dimension of {Z t } is larger than 
three, m(-) may not be estimated by fh(z) with reasonable accuracy due to "the curse of 
dimensionality" . The curse of dimensionality problem has been clearly illustrated in sev- 
eral books, such as Silverman [31], Hastie and Tibshirani [17], Green and Silverman [13], 
Fan and Gijbels [8], Hardle et al. [15], Fan and Yao [9] and Gao [10]. There are several 
ways to circumvent the curse of dimensionality. Perhaps one of the most commonly used 
methods is semi-parametric modelling, which is taken to mean partially linear modelling 
in this context. In this paper, we propose using a partially linear model of the form 

Y t =Xie +g{Vt) + e t , t=l,...,n, (1.3) 

where 0q is an unknown <i-dimensional vector; g(-) is some continuous function; {X t = 
{xti , • ■ • , Xtd) T } is a sequence of either stationary or non-stationary regressors, as assumed 
in Al below; {Vt} is a (3 null recurrent Markov process (see Section 2 below for detail); 
and {et} is an error process. As discussed in Section 3.2 below, {e t } can be relaxed to be 
either stationary and heteroscedastic or non-stationary and heteroscedastic. 

An advantage of the partially linear approach is that any existing information concern- 
ing possible linearity of some of the components can be taken into account in such models. 
Englc et al. [7] were among the first to study this kind of partially linear model. It has 
been studied extensively in both econometrics and statistics literature. With respect to 
development in the field of semi-parametric time series modelling, various estimation and 
testing issues have been discussed for the case where both {X t } and {Vt} are strictly sta- 
tionary (see, e.g., Hardle et al. [15] and Gao [10]) since the publication of Robinson [27]. 
For the case where {Vt} is a sequence of either fixed designs or strictly stationary regres- 
sors but there is some type of unit root structure in {A t }, existing studies, such as Juhl 
and Xiao [18], have discussed estimation and testing problems. 

To the best of our knowledge, the case where either {Vt} is a sequence of non-stationary 
regressors or both {X t } and {Vt} are non-stationary has not been discussed in the lit- 
erature. This paper considers the following two cases: (a) where {X t } is a sequence of 
strictly stationary regressors and {Vt} is a sequence of non-stationary regressors; and 
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(b) where both {X t } and {Vt} are non-stationary. In this case, model (1.3) extends some 
existing models (Robinson [27], Hardle et al. [15], Juhl and Xiao [18] and Gao [10]) from 
the case where {V t } is a sequence of strictly stationary regressors to the case where {V t } 
is a sequence of non-stationary regressors. Since the invariant distribution of the (3 null 
recurrent Markov process {Vt} does not have any compact support, however, the semi- 
parametric technique used in stationary time series cannot be directly applicable to our 
case. In this paper, we will develop a new semi-parametric estimation method to address 
such new technicalities when establishing our asymptotic theory. 

The main objective of this paper is to derive asymptotically consistent estimators for 
both 9q and g(-) involved in model (1.3). In a traditional stationary time series regression 
problem, some sort of stationary mixing condition is often imposed on the observations 
(X t , Vt) to establish asymptotic theory. In this paper, it is interesting to find that the pro- 
posed semi-parametric least-squares (SLS) estimator of 6q is still asymptotically normal 
with the same rate as that in the case of stationary time series when certain smoothness 
conditions are satisfied. In addition, our nonparametric estimator of g(-) is also asymp- 
totically consistent, although the rate of convergence, as expected, is slower than that 
for the stationary time series case. 

The rest of the paper is organized as follows. The estimation method of 9q and g(-) 
and some necessary conditions are given in Section 2. The main results and some ex- 
tensions are provided in Section 3. Section 4 provides a simulation study. An analysis 
of an economic data set from the United States is given in Section 5. An outline of the 
proofs of the main theorems is given in Section 6. Supplementary Material section gives 
a description for a supplemental document by Chen, Gao and Li [5], from which the 
detailed proofs of the main theorems, along with some technical lemmas, are available. 

2. Estimation method and assumptions 
2.1. Markov theory 

Let {Vt, t> 0} be a Markov chain with transition probability P and state space (E,£), 
and <j> be a measure on (E, £ ). Throughout the paper, {Vt} is assumed to be ^-irreducible 
Harris recurrent, which makes asymptotics for semi-parametric estimation possible. The 
class of stochastic processes we are dealing with in this paper is not the general class of 
null recurrent Markov chains. Instead, we need to impose some restrictions on the tail 
behavior of the distribution of the recurrence time S a of the chain. This is what we are 
interested in: a class of /3 null recurrent Markov chains. 

Definition. A Markov chain {Vt} is /3 null recurrent if there exist a small non-negative 
function /(•) (the definition of a small function can be found in the supplemental doc- 
ument), an initial measure X, a constant ft £ (0,1) and a slowly varying function £/(•) 
such that 




(2.1) 
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where Ea stands for the expectation with initial distribution A and T(-) is the usual 
gamma function. 

It is shown in Karlsen and Tj0stheim [20] that when there exist some small measure v 
and small function s with f (E) = 1 and < s(v) < 1, d£E, such that 

P>s®v, (2.2) 

then {Vt} is j3 null recurrent if and only if 

P a (5 a >n) = r(1 _^ (n) (1 + 0(1)), (2.3) 

where L s = and 7r s is the invariant measure as defined in Karlsen and Tj0stheim [20]. 

Furthermore, if (2.3) holds, by Lemma 3.4 in Karlsen and Tj0stheim [20], (3 := t^i^Mi 
is a strongly consistent estimator of /3, where Nc(n) = Y17=i Ic(Vt), in which Ia(-) 
is the conventional indicator function and C is a small set as defined in Karlsen and 
Tj0stheim [20]. 

We then introduce a useful decomposition that is critical in the proofs of asymptotics 
for nonparametric estimation in null recurrent time series. Let / be a real function 
defined in R. We now decompose the partial sum S n (f) = ^™ =0 /(Vt) into a sum of 
independent and identically distributed (i.i.d.) random variables with one main part and 
two asymptotically negligible minor parts. Define 

f^f(Yt), k = Q, 



t=o 



i 

t=T k _ 1 + l 

n 

J2 /(Vt), fe=(n), 

t = 1 "jV(n)+ 1 

where the definitions of and N(n) will be given in the supplemental document. Then 

N(n) 

S n (f) = Z +J2z k + Z {n) . (2.4) 

k=l 

From Nummelin's [24] result, we know that {Z^,k > 1} is a sequence of i.i.d. random 
variables. In the decomposition (2.4) of S n (f), N(n) plays the role of the number of 
observations. It follows from Lemma 3.2 in Karlsen and Tj0stheim [20] that Zq and Zi n \ 
converge to zero almost surely when they are divided by N{n). Furthermore, Karlsen 
and Tj0stheim [20] show that if (2.2) holds and J \ f(v)\ir s (dv) < °°i then for an arbitrary 
initial distribution A we have 

S n (.f) — > 7T s{f) almost surely (a.s.), (2.5) 



N{n) 

where 7r s (/) = / f(v)n a (dv) 
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Some useful results for Markov theory are available from Appendix A of the supple- 
mental document. 

2.2. Estimation method 

As assumed in assumption Al below, there exist a function H(-) and a stationary pro- 
cess {U t } such that X t = H(V t ) + U t . Since E[e t \V t =v] = E[e t ] = is assumed in A2(ii) 
and A3(ii), we have 

E[Y t \V t =v] =E[{Xje + g(V t ) + e t )\V t = v] = H(v) T 6 + g(v). (2.6) 

This implies that ty(v) = E[Y t \Vt = v] is a function of v independent of t for each fixed v 
and given 9q. Thus, the form of g(v) can be represented by 

g(v) = *(v)-H(v) T 6 . (2.7) 

In view of (2.7), we can rewrite model (1.3) as 

Y t -y(V t ) = (X t -H(V t )) T 9 + e t . (2.8) 

Letting W t =Y t - *(V t ) and U t = X t - H(V t ), model (2.8) implies 

W t = Y t - *(V t ) = (X t - H(V t )) T 9 + e t = U[9 Q + e t . (2.9) 

Note that E[W t ] = E[U[9 ] + E[e t ] = 0. In the case where {{X t , V t , e t ): t> 1} is a se- 
quence of stationary random variables, various estimation methods for 8q and g(-) in 
model (1.3) have been studied by many authors (see, e.g., Robinson [27], Hardle et 
al. [15] and Gao [10]). 

We now propose an SLS estimation method based on the kernel smoothing. For every 
given 9, we define a kernel estimator of g(v) by 

n 

g n (v;9) = J2w nt (v)(Y t -Xl9), (2.10) 
t=i 

where {w n t(v)} is a sequence of weight functions given by 

K vJl (V t ) ijsfVt-v 

W nt(v) = ^n 77 TTTT With K v . h (V t ) = -K [ —— 

in which K(-) is a probability kernel function and h = h n is a bandwidth parameter. 

Replacing g(Vt) by g n (V t ',9) in model (1.3) and applying the SLS estimation method, 
we obtain the SLS estimator, 9 n , of 9q by minimizing 

n 

-^(Yt-X^-g,^)) 2 

over 9. This implies 

8„ = (jTx^JTy, (2.11) 



6 



J. Chen, J. Gao and D. Li 



where X T = (Xj, . . . , X n ), X t = X t - £Li w nk (V t )X k , Y T = (Y 1 ,..., Y n ) and Y t = Y t - 
Efc=i w nfc(Vt)yfe. And <?(•) is then estimated by 

9n(-)=9n(-;O n ). (2.12) 

This kind of estimation method has been studied in the literature (see, e.g., Hardle et 
al. [15]). When {Vt} is a sequence of either fixed designs or stationary regressors with 
a compact support, the conventional weighted least-squares estimators (2.11) and (2.12) 
work well in both the large and small sample cases. Since the invariant distribution of (3 
null recurrent Markov chain {Vt} might not have any compact support, it is difficult to 
establish asymptotic results for the estimators (2.11) and (2.12) owing to the random 
denominator problem involved in w n t(')- Hence, to establish our asymptotic theory, we 
apply the following weighted least-squares estimation method (see, e.g., Robinson [27]). 
Define 

F t :=F nt =I(\p n (V t )\>b n ), (2.13) 

where 

1 



fc=i 

and {b n } is a sequence of positive numbers satisfying some conditions. Furthermore, let 



X T = (X 1 F 1 ,...,X n F n ) and Y T = (Yj.Fi, . . . , Y n F n ). 
Throughout this paper, we propose to estimate 6q by 

9 n = (X T X)- 1 X T Y (2.14) 

and g(-) by 

9n{-)=9n{-X). (2.15) 



2.3. Assumptions 

As may be seen from equation (2.9), further discussion on the semi-parametric estimation 
method depends heavily on the structure of {X t } and {Vt}. This paper is concerned with 
the following two cases: (i) where {X t } is a sequence of strictly stationary regressors and 
independent of {Vt}; and (ii) where {X t } is a sequence of non-stationary regressors with 
the non-stationarity being generated by {Vt}. 

Before stating the main assumptions, we introduce the definition of a mixing depen- 
dence. The stationary sequence {Z t ,t = 0,±1, . . .} is said to be a mixing if a(n) — s- as 
n — > oo , where 

a(n)= sup \P(AB) - P(A)P(B)\, 
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in which {T^} denotes a sequence of a fields generated by {Z t ,k < t <j}. Since its 
introduction by Rosenblatt [29], a mixing dependence is a property shared by many 
time series models (see, e.g., Withers [34] and Gao [10]). For more details about limit 
theorems for a mixing processes, we refer to Lin and Lu [22] and the references therein. 

The following assumptions are necessary to derive the asymptotic properties of the 
semi-parametric estimators. 

Al. There exist an unknown function H(y) and a stationary process {Ut} such that 
X t = H(V t ) + U t . 

A2. (i) Suppose that {Ut} is a stationary ergodic Markov process with E\U\] = and 
£'[||t r i|| 4+71 ] < oo for some 71 > 0, where || • || stands for the Euclidean norm. Furthermore, 
we suppose that £ := E[UiU[] is positive definite and {Ut} is a mixing with 

E< /(4+7l, W<». (2-16) 
t=i 

where ajj{t) is the a mixing coefficient of {Ut}. 

(ii) Let {e t } be a stationary ergodic Markov process with E[ei] = 0, a 2 := E[e\] > 
and £'[|ei| 2+72 ] < 00 for some 72 > 0. Furthermore, the process {et} is a mixing with 

00 

^ a 72/(2+72)^ <0O; (2.17) 
t=l 

where a e (t) is the a mixing coefficient of {et}. 

A3, (i) The invariant measure n s of the /3 null recurrent Markov chain {Vt} has 
a uniformly continuous density function p s {')- 

(ii) Let {U t }, {V t } and {e t } be mutually independent. 

A4. Let /i,fe(-) be the density function of 

Vi t k = <fi-k(Vi - V k ) for i > k with ip m = m /3 ~ 1 L s (m) for m > 1. 

Let 

inf limsupsup sup fi+ m ,i(v) < 00. (2-18) 

<5>0 rn _i.oo i>l \v\<5 

Furthermore, there exists a sequence of a fields {Ft,t > 0} such that {Vt} is adapted 
to F t . With probability 1, 

inf limsupsup sup fi+ m .i(v\ T t ) < 00, (2.19) 

o>0 m~yoo i>l \v\<8 

where fi.kiv^k) is the conditional density function of Vi^k given Tk- 

A5. (i) The function g(v) is differentiable and the derivative is continuous in v € R. 
In addition, for n large enough 



(g'(<pT 1 v)) 2 f t , (v)dv = O(nh- 1 ), 

i=l "' 



(2.20) 
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where <?'(•) is the derivative of <?(•), the definitions of ip t and ft,o(v) are given in A4 
above. 

(ii) The function H{v) is diffcrcntiable and the derivative is also continuous in v G R. 
In addition, for n large enough 

n ~ 

Y, / \\H'(<Pt 1 v)ff t , (v)dv = O(nh- 1 ) (2.21) 
t=i J 

and 

J] / llff , (vr 1 «)^ , (v t - 1 «)ll/*,o(«)dt; = 0(n 1 /2-«^/ l -2) > (2.22) 

where £i > is small enough. 

A6. (i) The probability kernel function K (■) is a continuous and symmetric function 
having some compact support. 

(ii) The sequences {h n } and {&„} both satisfy as n— > oo 

h n ^0, b n ^0, n eo h n b- 4 ^0 and n p - e °h n b^-> oo (2.23) 

for some < eo < § • Moreover, 

n 

^P(p„(F t )<6„)=o(n). (2.24) 
t=i 



Remark 2.1. (i) While some parts of assumptions A1-A3 may be non-standard, they 
are justifiable in many situations. Condition Al assumes that {X t } is generated by X t = 
H(Vt) + Ut- This is satisfied when the conditional mean function H(v) = E\X t \Vt = v] 
exists. In this case, Al holds automatically with Ut = Xt — E[X t \Vt}. Condition Al is 
also commonly used in the stationary case (see, e.g., Linton [23]). There are various 
examples in this kind of situation (see, e.g., in the univariate case where X t = Vt + £t, in 
which {et} is a sequence of i.i.d. errors with E[e t ] = and E\ef\ < oo, and independent of 
{Vt}. In this case, H(v) = E[X t \Vt = v]=v and Ut = e<). As a consequence, condition Al 
does not include the case where {X t } is a random walk sequence of the form X t = 
X t ~i + Ct- Note that the case where the non-stationarity in both {X t } and {Vt} is 
generated by a common random walk structure will need to be discussed separately, 
since the methodology involved is likely to be quite different. In Section 3.2 below, we 
will give some discussion about the case where {H (Vt)} is replaced by a bivariate function 
of the form {H(Vt,t)} to take into account the inhomogcncous case. 

(ii) The stationarity assumption on {Ut} is to ensure that the conventional -y^n-rate 
of convergence is achievable and thus it is possible to construct an asymptotically ef- 
ficient estimator for 9q. The stationarity condition on {Ut} also requires that X t can 
be decomposed into a non-stationary component represented by H(Vt) and a stationary 
component {Ut}. The a mixing dependence in A2 is a mild condition on {Ut} and the 
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errors process {et}. Karlsen et al. [19] have made similar assumptions. As discussed in 
Section 3.2 below, A2(i) can be relaxed to allow for the inclusion of both endogeneity 
and hctcrosccdasticity. Note that A2(ii) can also be relaxed to allow for the inclusion of 
a deterministic function in model (1.3). In such cases, model (1.3) can be naturally ex- 
tended to a semi-parametrc additive model of the form Y t = X[9 + g{V t ) + \{U tl t) + e t 
as discussed in Section 3.2 below. 

(iii) As we can see from the asymptotic theory below, the condition on the existence 
of the inverse matrix S _1 is required in Theorem 3.1. In the case where {(X t ,Vt)} is 
a vector of either independent regressors or stationary time series regressors, Hardle et 
al. [15] also assume similar conditions (see Section 1.3 in their book) for establishing the 
asymptotic results for the conventional least-squares estimators of do in (2.11) and of <?(•) 
in (2.12). Condition A3(i) corresponds to analogous conditions on the density function 
in the stationary case. A3(ii) imposes the mutual independence to avoid involving some 
extremely technical conditions. 

Remark 2.2. A4 is similar to but weaker than Assumption 2.3(h) in Wang and Phil- 
lips [32]. It is easy to check that (2.18) and (2.19) are satisfied with j3 = 1 and L s (-) = 1 
when {Vt} is a sequence of either i.i.d. or stationary dependent variables. Consider the 
random walk case defined by 



where {v t } is a sequence of i.i.d. random variables. The random walk model (2.25) is 
very important in economics and finance and has been studied by many authors. It 
corresponds to a 1/2 null recurrent process and it is easy to check that (2.18) and (2.19) 
are satisfied with f3 = 1/2, L s (n) = 1 and Tk = cr(vi,i < k). On the other hand, (2.18) 
and (2.19) can be formulated in terms of the transition probability. For example, assume 
that the transition probability of the Markov process {Vt} is defined by 



Let / fc (-) be the marginal density of {T4} and f m {x\y) be the m step transition density. 
Then 



where ip m is defined in A4. 

Remark 2.3. (i) A5(i) is assumed to make sure that the bias term of the nonpara- 
mctric estimator is negligible when establishing the asymptotic distribution of the semi- 
parametric estimator 9 n . When {Vt} is the random walk process defined by (2.25), con- 
dition A5(i) can be verified. If 



Vt = Vt- 1 +v t , 



t = l,2,...,V =0, 



(2.25) 



P(x,dy) = f{x\y)dy. 



fi 




g(v) = £>o + QlV + Q 2 \v\ 



1+S 



0<S < 1/2, 



(2.26) 
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n s °h = 0(1) and ft,o(v) = 0(v (!+ 2,5 o+?)) for some <r > as t — > oo and v — > oo, we can 
show that by A4, 

E / (s'fe-Mf/ ( ,oW^ = E^" M ° = 0(r, 1+5 °), 
t=i"' \t=i / 

which implies (2.20). 

(ii) Similarly, condition A5(ii) is also verifiable. Consider the case where 

g(v) = g Q + g 1 v and H(v) = a + &iv + a 2 \v\ 1+Sl , 1 < 5i < 1/2, 

in which a^, k = 0,1,2, are d-dimensional vectors, n 1 l 2+5l ~ ei h? = 0(1) (ei < 4 — <5i) 
and ft,o(v) = 0(v~ i - 1+25l+ ^) for some c > as i — > oo and i; — > oo. We can also show 
that (2.21) and (2.22) hold for the random walk case. The detailed calculation is similar 
to that in Remark 2.3(i) above. 

Remark 2-4- (i) Condition A6(i) is a quite natural condition on the kernel function 
and has been used by many authors for the stationary time series case. The first part 
of A6(i) requires that the rate of — s- oo is slower than that of n e °h—>- and the rate of 
bf t — >• is slower than that of nr~ e °h — > oo. Such conditions are satisfied in various cases. 
Letting b n = C(,log _1 (ri) and h n = Chn"^ for some q, > 0, cu > and £o < Co < P — So, 
then the first part of A6(ii) holds automatically. 

(ii) The second part of A6(ii) is imposed to ensure that the truncated procedure works 
in this kind of problem. When {V*} is a sequence of i.i.d. random variables having some 
compact support S, it is easy to show that (2.24) holds if mi xG s p{x) > 0, where p(-) is 
the density function of {Vt}. In the case where {Vt} is an i.i.d. sequence without any 
compact support, Robinson [27] gives different conditions such that (2.24) holds. We 
can show that condition A6(ii) is verifiable when {Vt} is a random walk model of the 
form (2.25). Since the verification is quite technical, the details are given in the last part 
of Appendix C in the supplemental document. 



3. The main results and their extensions 
3.1. Asymptotic theory 

We now establish an asymptotic distribution of the estimate 9 n in the following theo- 
rem. The following theorem includes two cases: (a) {Vt} is a sequence of non-stationary 
regressors and {X t } is a sequence of strictly stationary regressors and is independent 
of {Vt}; and (b) both {X t } and {Vt} are non-stationary. 

Theorem 3.1. Let Al-A5(i) and A6 hold. In addition, suppose that ^ e ,u '■= o~ 2 Yi + 
2 J2t^2 E[ e i e t\E[UiU[) is positive definite. 
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(i) If {X t } is strictly stationary and independent of {Vt}, then as n— >oo, 

V^(e n -e o )^N(0,E- 1 E etU ^- 1 ). (3.1) 

(ii) Suppose that both {X t } and {Vt} are non- stationary. If in addition, A5(ii) is 
satisfied, then (3.1) still holds. 

Remark 3.1. (i) Theorem 3.1 shows that the standard normality can still be an asymp- 
totic distribution of the SLS estimate even when non-stationarity is involved. Theo- 
rem 3.1 (ii) further shows that the conventional rate of y/n is still achievable when the 
non-stationarity in {X t } is purely generated by {Vt} and certain conditions are imposed 
on the functional forms of H(-) and g(-). 

(ii) Since the asymptotic distribution and asymptotic variance in (3.1) are mainly 
determined by the stationary sequences {e*} and {Ut}, the above conclusion extends 
Theorem 2.1.1 of Hardle et al. [15] for the case when {X t }, {Vt} and {et} are all strictly 
stationary. In addition, when {X t } is assumed to be strictly stationary and independent 
of {Vt} in Theorem 3.1 (i) , the covariance matrix reduces to the covariance matrix of {X t } 
of the form S = E[(X 1 - E[X{\){Xi - E[Xi]) T ]. 

Remark 3.2. (i) Theorem 3.1 establishes an asymptotically normal estimator for 8q. As 
in the independent and stationary sample case, an interesting issue is how to construct an 
asymptotically efficient estimator for 9q. As discussed in Chen [4] and Hardle et al. [15], 
it can be shown that 9 achieves the smallest possible variance of ct 2 S _1 when both {Ut} 
and {e t } are independent and e t ~ N(0,a 2 ). 

(ii) Since the publication of the book by Bickel et al. [3], there has been an increas- 
ing interest in the field of asymptotic efficiency in semi-parametric models. There are 
certain types of asymptotic efficiency in this kind of semi-parametric setting. Hardle et 
al. [15] consider several types of asymptotically efficient estimators in Chapters 2 and 5 
of the book. Linton [23] considers second-order efficiency. Bhattacharya and Zhao [2] 
establish an asymptotically efficient estimator without requiring finite variance. Chen [6] 
discusses asymptotic efficiency in nonparametric and semi-parametric models using sieve 
estimation. 

(iii) As shown in the literature, the establishment of an asymptotically efficient estima- 
tor in this kind of semi-parametric setting requires the availability of uniform convergence 
of nonparametric estimation. Since such uniform convergence results are not readily avail- 
able and applicable in this kind of non-stationary situation, we wish to establish some 
necessary uniform convergence results first before we may be able to address the issue of 
asymptotic efficiency in future research. 

An asymptotic distribution ofg n (x) is given in Theorem 3.2 below. 

Theorem 3.2. (i) Let the conditions of Theorem 3.1(i) hold. If in addition, g(-) is twice 
differ entiable and the second derivative, g"(v), is continuous in v and n^^ 5+s h = o(l) for 



12 



J. Chen, J. Gao and D. Li 



some e > 0, then as n — > oo, 



£ # (^—^ ) (9n(v) - («)) A TV (o, a 2 y K 2 (u) d M j . (3.2) 

(ii) Let the conditions of Theorem 3.1{]i) hold. If, in addition, g(-) is twice differ entiable 
and the second derivative, g"{v), is continuous in v and n^^ 5+e h = o(l) for some e > 0, 
then equation (3.2) remains true. 

Remark 3.3. The asymptotic distribution in (3.2) is similar to the corresponding results 
obtained by Karlsen et al. [19] and Wang and Phillips [32]. The rate of convergence 
is slower than that for the stationary time series case as Y?it=i K( Vt ~ v ) = Op(N(n)h) 
and N(n) is usually smaller than n almost surely. The condition n l3 ^ 5+£ h = o(l) makes 
sure that the bias term of the nonparametric estimator g n {v) is negligible. 



\ 



3.2. Some extensions 

In this section, we give some detailed discussion of the possible extensions raised in 
Remark 2.1(h) and (hi). In addition, we also suggest some other extensions. 

Instead of considering a variety of extensions of model (1.3) and Theorems 3.1 and 3.2, 
this section considers several extensions that are naturally based on the relaxation of Al™ 
A3 to Assumptions 3.1-3.3 below, respectively. As a consequence, the extended models 
proposed below allow for the inclusion of endogeneity, heteroscedasticity and determin- 
istic trending. 

Assumption 3.1. There are a bivariate function H(-,-) and a stationary process {Ut} 
such that X t = H(V t , £) + £/* for l<t< n. 

Assumption 3.2. (i) Let A2(i) hold. 

(ii) Let {et} be of the form of either e t = tr(Ct)e t or e t — A(£t) + et withQt = Ut orVt and 
£ f = Ut or £t = — , in which {e t } is a stationary ergodic Markov process satisfying A2(ii) 
and both er(-) and A(-) are smooth functions. 

Assumption 3.3. (i) Let A3(i) hold. 

(ii) Let {Vt} be independent of both {Ut} and {et}. In addition, E[et\Ut] =0. 

While it is difficult to consider some general non-stationarity for {AT t }, it is possible 
to consider a general inhomogeneous case in Assumption 3.1 to allow for a bivariate 
functional form of H(-, ■) such that the non-stationarity of {X t } is caused by both the 
involvement of {Vt} and the dependence on t. In this case, H(-,-) may be estimated 
nonparametrically by 

n K (V t) 

H{v,T)=Y i Wnt(v t T)X t with W nf (u,T)= n (3.3) 
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where K ViT (Vt,t) — j^j^Ki ( )-^2( *^ 2 ~ T ) , in which both -fQ(-) arc probability kernel 
functions and hi are bandwidth parameters for i = 1,2. 

Assumption 3.2(h) allows for inclusion of cndogcncity, heteroscedasticity and deter- 
ministic trending. In the case where we have either e t = o~(Ut)e-t or e t — a(Vt)et with 
E[e t \U t ] = E[e t \Vt) = 0, it follows that cither E[e t \V t ] = E[a(V t )e t \V t ] = a{V t )E[e t \V t ] = 
= E[e t ] or E[e t \V t ] = E[a(U t )e t ] = E[e t ]. This implies Assumption 3.2(h) holds in both 
cases. In addition, Assumption 3.2(h) also includes the case where e t = A(^) + e t or 
e t = \{U t ) + e t . In such cases, obviously we have E'fetjVt] = E[t t \- 

Under Assumptions 3.1-3.3, model (1.3) can be written as cither 

Y t = Xl9 +g(V t ) + a(Ct)et, 

(3-4) 

X t = H\V t ,A +U t , 



where Q = Ut or Vt , or 



Y t =X;e Q +g(Vt) + X(Ct) + et 1 

(3-5) 

Xt = H[Vu-)+U t , 



n 



where 6 = U t or ^ = | . 

Estimation of 0o an d g{-) in (3.4) is similar to what has been proposed in Section 2. 
Since model (3.5) is a semi-parametric additive model, one will need to estimate 6q based 
on the form Y t = X^9 + G(V t , 6) + e t with G(v,t) = g(v) + X(r) before both g(-) and A(-) 
can be individually estimated using the marginal integration method as developed in 
Section 2.3 of Gao [10]. 

In both cases, one will need to replace {w n t(v)} in (2.10) and p n (v) in (2.13) by 
{W nt (v, t)} of (3.3) and p n (v, t) = YZ=i K v,r{V k , k), respectively. 

Since the establishment and the proofs of the corresponding results of Theorems 3.1 
and 3.2 for models (3.4) and (3.5) involve more technicalities than those given in Ap- 
pendices B and C of the supplemental document, we wish to leave the discussion of 
models (3.4) and (3.5) to a future paper. 



4. Simulation study 

To illustrate our estimation procedure, we consider a simulated example and a real data 
example in this section. Throughout the section, the uniform kernel K{v) = 
is used. A difficult problem in simulation is the choice of a proper bandwidth. From the 
asymptotic results in Section 3, we can find that the rates of convergence are different 
from those in the stationary case with n being replaced by N(n). In practice, we have 
found it useful to use a semi-parametric cross-validation method (see, e.g., Section 2.1.3 
of Hardle et al. [15]). 
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Table 1. 


Simulation results for the estimator of do 






n 


ff(.) 


AE 


SE 


200 


H(v) = 


0.0137 


0.0144 


700 


H(v) = 


0.0117 


0.0086 


1200 


H(v) = 


0.0064 


0.0062 


200 


H{v) = v 


0.0172 


0.0215 


700 


H(v) = t> 


0.0149 


0.0126 


1200 


H(v) = v 


0.0079 


0.0108 



Example 4-1- Consider a partially linear time series model of the form 

Y t = X t 6 + g{V t ) + e u t = 1,2, . . . ,n, (4.1) 

where Vt = Vt-i + i>t with Vq = and {vt} is a sequence of i.i.d. random variables gener- 
ated from iV(0,0. 1 2 ), {et} is generated by an AR(1) model of the form 

e t = 0.5e t _i +rj t , 

in which {rj t } is a sequence of i.i.d. random variables generated from N(fi,l), {v t } 
and {ij t } arc mutually independent. We then choose the true value of as 8o = 1, the 
true form of <?(•) as go(v) = v and consider the following cases for {X t }. 

(i) X t = Ut, where {Ut} is a sequence of i.i.d. N(0, 1) random variables. 

(ii) Xt = Vt + Ut, where {Ut} is denned as in case (i). 

It is easy to check that the random walk {Vt} defined in this example corresponds 
to a 1/2 null recurrent process and the assumptions in Section 2 are satisfied here. We 
choose sample sizes n = 200, 700, 1200 and N = 1000 as the number of replications in the 
simulation. The simulation results are listed in Tables 1 and 2 and the plots are given in 
Figures 1-6. 

The performance of 8 n is given in Table 1. The "AE" in Table 1 is defined by 
iu^oSj=i \Q(J) ~ ®o\, where 8(j) is the value of 9 n in the j'th replication. "SE" is 



Table 2. Simulation results for the estimator of go(v) — v 



n 


H(.) 




AE 


SE 


200 


H(v) 


= 


0.1158 


0.0575 


700 


H(v) 


= 


0.0894 


0.0341 


1200 


H(v) 


= 


0.0628 


0.0210 


200 


H{v) 


= V 


0.1391 


0.0582 


700 


H(v) 


= V 


0.1299 


0.0437 


1200 


H(v) 


= V 


0.1075 


0.0367 
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-1.4 -1.2 -1 -0.8 -0.6 -0.6 -0.6 0.6 0.6 0.6 



Figure 1. Nonparametric estimate of the regression function go(v) for the case of H(v) = with 
sample size n = 200; the solid line is the true line, and the dashed curve is the estimated curve. 

the standard error of {9(j)}. From Table 1, we find that the estimator of 9q performs 
well in the small and medium sample cases and it improves when the sample size in- 
creases. 








-0 1 1 1 1 1 1 1 1 1 

-2 -1.5 -1 -0.5 0.5 0.5 



Figure 2. Nonparametric estimate of the regression function go(v) for the case of H (v) = with 
sample size n = 700; the solid line is the true line, and the dashed curve is the estimated curve. 
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Figure 4. Nonparametric estimate of the regression function go(v) for the case of H (v) = v with 
sample size n = 200; the solid line is the true line, and the dashed curve is the estimated curve. 
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and Wmin are the maximum and minimum of the random walk {Vt, 1 < t < n}, respectively. 
"SE" in Table 2 is the standard error. From Table 2, we find that the nonparametric 
estimate of go( w ) = v performs well in our example and it improves when the sample size 
increases. 




-1 0023456789 



Figure 6. Nonparametric estimate of the regression function go{v) for the case of H (v) = v with 
sample size n = 1200; the solid line is the true line, and the dashed curve is the estimated curve. 
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Figures 1-3 compare the true nonpar ametric regression function go{-) and its nonpar a- 
metric estimator for the case of H(v) = when the sample sizes are 200, 700 and 1200, 
respectively. Figures 4-6 compare the true nonparametric regression function go(-) with 
its nonparametric estimator for the case of H(v) = v when the sample sizes are 200, 700 
and 1200, respectively. The solid line is go(-) and the dashed line is the nonparametric 
estimator. We cannot forecast the trace of the random walk {Vt} because of its non- 
stationarity. Hence, we estimate the true regression function <?o(") according to the scope 
of {Vt} and we cannot estimate go(-) in other points out of the scope since there is not 
enough sample in the neighborhood of each of such points. That is why the scopes of the 
abscissa axis are different in Figures 1-6. We can also find that the performance of the 
nonparametric estimate of go(') improves as the sample size increases. 

5. An empirical application 

We use monthly observations on the U.S. share price indices, long-term government bond 
yields and treasury bill rates from Jan/1957-Dec/2009. The data are obtained from the 
International Monetary Fund's (IMF) International Financial Statistics (IFS). The share 
price series used is IFS Scries 11162ZF. The long-term government bond yield, which 
is the 10-year yield, is from the IFS Scries 11161ZF. The treasury bill rate is from IFS 
Series 11160CZF. Figure 7(a)-(c) gives the data plots of the share prices, the long-term 
bond yields and the treasury bill rates. 

To see whether there exist some statistical evidences for the three series to have the 
unit root type of non-stationarity, we carry out a Dickey-Fuller (DF) unit root test on 

(a) Treasury Bill Rates 

20 - 
10 - 

o - 

1950 1960 1970 1980 1990 2000 2010 

(b) Long-term Bond Yields 

20 - 
10 - 
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Figure 7. Time plots of the three series used in Section 5 over the period of Jan/1957-Dec/2009 
with 624 observations, (a) treasury bill rates; (b) long-term bond yields; (c) share prices. 
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Figure 8. Estimates of the nonparametric functions H(-) and <?(•) in Case A. 

the three series. We first fit the data by an AR(1) model of the form 

Z t = pZ t -i + e t , 

where Z t = share price at time t or long-term bond yield at time t or treasury bill rate at 
time t. Then, by using the least-squares estimation method, we estimate the parameter p 
for the three series: for the share price series, jo s hare = 1-0023; for the long-term bond 
yield series, pLbond = 0.9992; and for the treasury bill rate series, pTbili = 0.9966. Then 
we calculate the Dickey-Fuller t statistics and compare them with the critical values at 
the 5% significance level. The simulated P values for the long-term bond yields, treasury 
bill rates and share prices are 0.7040, 0.3130 and 0.4410, respectively. In addition, we also 
employ an augmented DF test and the nonparametric test proposed in Gao et al. [11] 
for checking the unit root structure of {Z t }. The resulting P values are very similar to 
those obtained above. 

Therefore, both the estimation results and the simulated P values suggest that there 
is some strong evidence for accepting the null hypothesis that a unit root structure exists 
in these series at the 5% significance level. 

We then consider the following modelling problem: 

Y t =X t e + g(V t )+e t , 
X t = H(V t ) + U t , 

where Case A: Y t is the share price, X t is the long-term bond yield and V t is the treasury 
bill; and Case B: Y t is the long-term bond yield, X t is the share price and Vt is the 
treasury bill. 

For Case A, the resulting estimator of 0q is 9 = —3.2155 and the plots of the esti- 
mates of <?(•) and H(-) are given in Figure 8. For Case B, the resulting estimator of 
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v 

Figure 9. Estimates of the nonparametric functions H(-) and <?(■) in Case B. 



6*o is 9 = —0.0037 and the the plots of the estimates of g(-) and H(-) are given in Fig- 
ure 9. 

Figures 8 and 9 show that increases in treasury bill rates tend to lead to increases 
in long-term bond yields and decreases in share prices. Such findings are supported by 
the theory of finance and consistent with existing studies. Moreover, Figures 7-9 clearly 
indicate our new findings that both null recurrent non-stationarity and nonlincarity can 
be simultaneously exhibited in the share price, the long-term bond yield and the treasury 
bill rate variables. 

Due to the cointegrating relationship among the stock price, the treasury bill rate and 
the long-term bond yield variables, our experience suggests that models (3.4) and (3.5) 
might be more suitable for this empirical study. We will have another look at this data 
after models (3.4) and (3.5) have been fully studied. 

6. An outline of the proofs of the theorems 

In this section, we provide only one key lemma and then an outline of the proofs of Theo- 
rems 3.1 and 3.2. The detailed proofs of the theorems are available from the supplemental 
document by Chen, Gao and Li [5]. 

Lemma 6.1. Under the conditions of Theorem 3.1, we have as n— > oo, 
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Proof of Theorem 3.1. In view of Lemma 6.1 and the decomposition 
X T X(6 n -9 Q ) = X T (Y-X6 Q ) 

n n n / n \ 

= J2 Xtg(Vt)Ft + XtetFt - x tFt Wnk ' 

t=l i=l t=l \k=l ) 

in order to prove Theorem 3.1, we need only to show that for large enough n 

n 

Y,Xtg{Vt)Ft = o P Wn), (6.2) 

t=l 

n ( n \ 

J2X t FAj2 w ^(Vt)e k \ = o P Wn), (6.3) 

n 

n -^Y^X t e t F t ^N{0,^ e ,u), (6.4) 
t=i 

where g(V t ) = g(V t )- ELi w nk (V t )g(V k ). Recall that X t =X t - £™ =1 w ns {V t )X s = U t - 

E"=iWns(V t )U s +H{V t ), where H(V t ) = H(V t ) - £"=i w ns (V t )H(V s ). 
In order to prove (6.2)-(6.4), it suffices to show that for large enough n 

n 

Y,Utg(Vt)F t = op{s/n), (6.5) 

t=l 

n 

YjJtg{Vt)F t = op^/n), (6.6) 

t=l 

n 

Y^g{Vt)H{V t )F t = op{s/n), (6.7) 

t=l 

n 

Y^UtltFt = o P (Vn), (6-8) 
t=i 

n 

YUtltFt = o P (Vn), (6-9) 

n 

£H(Vt)e t Jl = op(V^), (6.10) 
t=i 

n 

^Z7 t e tJ F t = o P (Vn), (6.11) 

n 

Y,H{V t )e t F t = op(V^), (6.12) 
i=l 
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n 

n- 1/2 Y / U ^tF t ^N(Q,^ u ), (6.13) 



t=i 



where U t = S" =1 w ns (V t )U s and e t = J2™=i w ns (V t )e s . 

In the following, we verify equations (6.5)-(6.13) to complete the proofs of Theo- 
rem 3.1(i) and Theorem 3.1(h). Note that, for Theorem 3.1(i), equations (6.7), (6.10) 
and (6.12) hold trivially. 

By the continuity of <?(•) and </(•) , we have for 1 < t < n, 

E^(^^W^)-5(^)) 



N(n)h ^ 

.7 = 1 



(6.14) 

N(n)h ^ 



^.-^p^i (i + 0P (i)) 



Thus, in view of (6.14) and Lemma 3.4 of Karlsen and Tj0stheim [20], in order to 
prove (6.5), it suffices to show that for n large enough 

n 

£ U t A n {V t )F t = op(v^), (6.15) 
t=i 

where A n (V t ) = n ,-<j£\ Vt) E? =1 (^ - 

This kind of procedure of replacing N(n) by n@~ n and ignoring a small-order term as 
involved in (6.14) will be used repeatedly throughout the proofs in Appendices B and C 
of the supplemental document. 

We then may show that (6.6) holds. Similarly to (6.14) and (6.15), we need only to 
show that 

n 

]T U t A n (V t )F t = op(v^), (6.16) 
t=i 

where ^ = n »-vh Pn (V t ) (X%=1 Ki^Pk). 

The detailed derivations for (6.15) and (6.16) are available from Appendix B of the 
supplemental document. The detailed proofs of (6.8), (6.9), (6.11) and (6.13) are also 
available from Appendix B. This will complete the proof of Theorem 3.1(i). 

We then may prove Theorem 3.1(h) by completing the proofs of (6.7), (6.10) and (6.12), 
which are again available from Appendix B of the supplemental document. □ 

Proof of Theorem 3.2. By the definition of g n (v), we have 

n 

9n(v)-g(v) = y^w n t{v)(Y t ~X t 9 n ) -g(v) 

t=1 (6.17) 
= ^w nt {v)(e t + g(V t )-g{v)) +J2 w ™t(v)X t 
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Let $ n>1 = ]T" =:L w nt (v)(e t + g(V t ) - g(v) and $„ >2 = X)"=i Wnt{v)X t {6 Q -9 n ). Then, 
we have 



- = ^ iy„ { (w)(y t - X t e„) - g(v) = $„.i + $„, 2 . 



(6.18) 



t=i 



Since {e t } is assumed to be stationary and a mixing, by Corollary 5.1 of Hall and 
Hcyde [14] and an existing technique to deal with the bias term (see, e.g., the proof of 
Theorem 3.5 of Karlsen et al. [19]), we have as n — > oo 



Vt-v 



$n,i-^N[ 0,cr 2 / K 2 (u)du 



(6.19) 



By (6.17)-(6.19), it is sufficient to show that 



V t -v 



$n,2 = Op(l). 



(6.20) 



The proof of (6.20) may then be completed by Theorem 3.1 and Assumptions A1-A6. 
The details are available from Appendix B of the supplemental document. This completes 
an outline of the proofs of Theorems 3.1 and 3.2. □ 
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Proofs of the theorems (DOI: 10.3150/10-BEJ344SUPP; .pelf). We provide this sup- 
plemental document in case the reader may want to have a look at the detailed proofs 
of Theorems 3.1 and 3.2 and Lemma 6.1. The details are available from Chen, Gao and 
Li [5]. 
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